(2.1.2) ILP History, Overview and Prespective

Joseph A. Fisher and B. Ramakrishna Rau. Instruction-Level Parallel Processing, Science, 13 September 1991, pp. 1233-1241. Science Link

1992

For ILP 
  1. The dependences between operations must be determined
  2. The operations that are independent of any other operatation that has not as yet completed must be determined
  3. These independent operations must be scheduled to execute at some particular time, on some specific funct unit, and must be assigned a register to which the result may be deposited. 


Hardware and Software Techniques for ILP Exec
HW
Pipelining / multiple functional units 
+ ILP increase 
-  interconn network/reg files goes up as square of funct units.
Superpipelining > eg : int ALU is pipelined. 
Superscalar : multiple instruction issue of seq programs
Horizontal microprogramming > 
Compiler decides
VLIW issues
 
Problem : small size of basic blocks
Dynamic schemes for speculative execution
terminate unnec speculative computation once the branch has been resolved
undo the effects of operation that should not been executed
ensure no exceptions are reported
preserve enuf exec state to resume
 
Compile time > trace scheduling
 
Branch prediction to guide speculative scheduling
gather execution statistics
compile time profiling (Trace sched/superblock sched)
Dynamic branch predition ++
Predicted execution : to selectively squash. 
 
ILP Compilation
Scheduling > NP complete problem
Only a problem on VLIW kind processors (non sequential)
Local Scheduling
sheduling barrier is assumed to exist between adjacent basic blocks in control flow graph
(Local code compaction)
+ computationally inexpensive (one-pass)
+ near optimal
 
Global Acyclic Scheduling
Trace Sched
compiler often can, by rearranging its generated machine instructions for faster execution, improve program performance. Trace scheduling is one of many known techniques for doing so.

Trace scheduling was originally developed for Very Long Instruction Word, or VLIW machines, and is a form of global code motion. It works by converting a loop to long straight-line code sequence using loop unrolling and static branch prediction. This process separates out "unlikely" code and adds handlers for exits from trace. The goal is to have the most common case executed as a sequential set of instructions without branches.

SuperBlock Schel 

a simplified form of trace scheduling which does not attempt to merge control flow paths at trace "side entrances". Instead, code can be implemented by more than one schedule, vastly simplifying the code generator
Cyclic Scheduling 
Software pipelining
Take loops, unroll them find repeating patterns and reroll them. modulo constraint : can be because of initiation interval or Resource constraints. 
>> wiki : Instruction scheduling
Register Allocation
register allocation first?
+ adequate registers more important than optimizing schedule. 
+ shorter schedules > higher register pressure. 
- Antidependences and output dependences 
prepass scheduling (sched before reg allocation)
graph coloring
best : schedule first, register allocate, shedule again.